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ABSTRACT 



This thesis describes the design of a built-in self-test capability 
for a military airborne digital computer. The supportive investigation 
of program constraints and their effects on the example test desian is 
intended to give broad perspective to the general self-test design 
problem. Alternate procedures for achieving the goal of airborne 
detection and isolation of a certain class of failures to the modular 
level are surveyed. A specific test design is evolved il lustra tinq the 
unique mix of program-oriented, periodic techniques, and added hardware, 
continuous techniques best suited to the example development program. 

The test design is evaluated and further work is suggested. 
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I. INTRODUCTION 



Maintenance and repair of faulty electronic equioment have always 
been the less glamorous companions of design and operation. Indeed, 
the subjects were often broached only after design concepts were 
formed and specific circuitry developed. The evolution of increasinaly 
complex electronic systems, such as digital computers, has forced 
greater and earlier consideration of the problems of locating failures 
and correcting them. A digital computer which automatically tests 
itself for proper operation and which provides valuable information to 
facilitate maintenance and repair has become very attractive for mili- 
tary and space systems applications. This thesis reports the results 
of an investigation to provide such automatic self-checking for a 
digital computer system. 

A project which considerably supported the investigation was accom- 
plished at Hughes Aircraft Company, Culver City, California, during an 
industrial experience tour. The project goal of designing a built-in 
self-test (BIT) capability for an advanced airborne digital computer 
system for military application was more fully realized because BIT 
was accented as a principal design consideration early in the architec- 
tural design procedure. The specific design developed will be used 
as an example; however, the test procedures will be recognized as 
being more generally applicable to the class of digital computers for 
which the assumptions and constraints applied herein can be validated. 
Only one of many possible solutions to the fault detection and isolation 
problem will be presented. The choice made should not be construed to 
reflect official policy at Hughes Aircraft Company. 
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Some general comments at the outset should place this investigation 
in proper perspective and temper expectation with pragmatism. The inveS' 
tigation has as its central focus the specific BIT design developed; 
however, it is intended to consider the broader systems design options 
available, thereby showing the example design in better perspective. 

As Sellers, Hsiao and Bearnson [Ref. 43] so aptly observe, one should 
initially set reasonable design objectives relative to the thoroughness 
of test, recognizing that exhaustive automatic test is an almost 
unattainable practical goal. As part of a computer development pro- 
gram, the BIT design is subject to the larger program objectives and 
constraints. The first part of this investigatton will define the test 
design problem in more specific terms. Subject to practical limitations 
a reasonable set of test objectives will be developed. Once objectives 
have been focused, alternatives for implementation will be considered 
and a test concept evolved. Specific test procedures will be presented 
for automatically testing the digital computer. Finally, the results 
obtained will be critically evaluated in light of the design objectives, 
and further related work will be suggested. 
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II. PROBLEM DEFINITION AND DESIGN OBJECTIVES 



A. BROAD GOALS 

Given the framework of a digital computer in a military avionics 
application, one can identify three broad goals for a self-test 
capability: 

1. To decrease the cost of ownership by reducing maintenance cost/- 
time and increasing system availability. 

2. To indicate to the pilot in flight the level of system operational 
caoability available to him. 

3. To provide limited assistance through self-test in prototype 
design and checkout. 

Any information relative to the existence and location of failure will 
reduce the time spent (and hence cost) to renair the computer and 
therefore increase the aircraft's availability for operational purnoses. 
Airborne indications of system degradation through failure allow the 
pilot to make timely and informed choices of alternatives to optimize 
the probability of successful mission completion. Lastly, self-test 
during computer development assists the engineer to more quickly iden- 
tify and correct design and hardware faults. In short, BIT is designed 
to provide a greater system effectiveness at a lower cost; that is, to 
increase cost-effectiveness. 

B. PROGRAM CONSTRAINTS 
1 . Cost of BIT 

In a very real sense, the dominating factor effecting BIT design 
problem definition is cost. Cost has several facets. The cost of BIT 
is considered to be part of the overall computer program price tag. 
Required performance criteria for the completed computer system are 



9 



specified by the sponsoring government agency to the aerospace industry. 
A participating company must strive to reduce its proposed system's 
cost while meeting or exceeding specifications to remain competitive. 

So within the overall program development and production cost, the 
contribution of BIT must be justified and minimized. Since the broad 
goal of increased cost-effectiveness has been identified for BIT, 
justification includes critical assessment of the added cost to the 
computer program of providing a self-test capability to ensure that a 
compensatory benefit in reduced cost of ownership will be realized. 

Sources of added cost for BIT include but are not limited to 
the followinq: 

1. The checking hardware itself 

2. Additional power required 

3. Greater capacity logic to provide for the added checking 
hardware; e.g., drivers with greater fanout 

4. Additional data lines to provide for test hardware and 
procedures 

5. Storage capacity required for BIT routines and data 

6. Design, programming, and development costs 

Other "costs", often translated into dollar values, include the 
penalties (if any) attached to increased size and weight of an airborne 
computer provided with BIT capability. For an air superiority fiahter 
application, these penalties are severe.^ 



Hughes Aircraft Co. uses internally generated weighting factors 
of $500/1 b and $5000/ft^ for added hardware. To illustrate using 
these typical penalties, two computers are compared: 

1) A 0.5 ft3, 25 lb computer costing $50k 

2) A 0.4 ft3, 20 lb computer costing $52. 5k 
The penalties added to computer (1) are; 

0.1 ft^ X $5000/ft^ = $ 500 for volume 

5 lb X $500/lb = $2500 for weight 

Total = $3000 penalty 

Computer (2), though ostensibly costing more, is $500 less expensive 
after penalties are applied. 
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The benefits of BIT can also be reduced to monetary terms by 
operations analysis techniques. Projected maintenance experience, snare 
parts costs, inventory levels, and the effects of maintenance concents 
can all be qiven dollar values. However, the relative weight that 
increased system operational availability receives is more subjective. 

In a space system, for example, there is a very hiah nremium on avail- 
ability; in a military airborne system, availability is imnortant but 
not as critical. 

The result on the overall cost of ownership for the military 
system is that, while the penalties for providing BIT are quite clear, 
the benefits are harder to evaluate and therefore less visible. Even 
when a clear long-term reduction in cost of ownership can be expected, 
insufficient available funding may force procurement of a less expen- 
sive option without a BIT capability. The effect on BIT design is to 
place emphasis on minimizing the more visible penalties, reducing them 

to an acceptable fixed percentage of the system cost without a BIT 
2 

capability. 

2. The Parent Computer 

The nature of the computer for which the self- test capability 
is to be provided certainly has a large influence on the BIT design 
objectives. For the example design, the characteristics for the 
parent computer evolved from the original specifications and the 
subsequent company policy decisions. The parent computer was to: 



2 

Estimates in the literature range from 3 % cost increase for BIT 
for a commercial machine to over 300% for a triplicated space 
system computer. A figure of 10% fell in the general area of 
acceptability at Hughes Aircraft Company for this project. 
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1. Have a military avionics application 

2. Be modular 

3. Have flexible word length 

4. Be non-redundant 

5. Be reoaired on the ground, not in the air 

6. Have minimal storage capacity 

7. Suffer no operational degradation because of BIT 

8. Be developed on short schedule at low risk 

Each of these characteristics will be more thoroughly discussed. 

A military avionics application implies that size and weiqht 
are to be minimized consistent with the cost penalties discussed earlier. 
It also imolies high speed, real-time comoutation. The more rigid 
military specifications concerning operating temperatures, humidity, 
shock resistance and other severe environmental factors affect the 
quality of components used and the packaging of these components at 
all levels. 

The comnuter was to be of modular construction, the term module 
referring to a standardized plug-in circuit card with a given surface 
area and number of pin connectors. The Naval Avionics Facility 
Indianapolis (NAFI) has developed a series of modules designed to be 
acceptable as the basic building blocks for many military applications 
[Ref. 10]. The basic "NAFI module" chosen for use in the parent com- 
puter (with some modifications) was the "2A" size whose important fea- 
tures relative to BIT design are dimensions of roughly five (5) inches 
in length and two (2) inches in height (both sides may be used for 
mounting hardware) and 30 pins in the two bottom connectors. Figure 1, 
derived from Ref. 10, depicts the 2A NAFI module. The module's surface 
area and number of pins place limitations on (1) the amount of hardware 
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which will physically fit on the module (heat dissioation is a related 
problem), and (2) on the number of external, intermodular electrical 
paths available. The level of solid state technology of the imolementing 
circuitry determines whether the area or pin limitation dominates. For 
example, circuitry consisting of discrete components (separate trans- 
istors, capacitors, resistors) tends to impose an area limitation because 
the relatively large size of individual components limits the number 
which can be accommodated in the fixed area, before the available pin 
connectors are exhausted. At the other extreme, circuitry implemented 
using large scale integration (LSI) technology, in which perhaps 1000 
or more gates are placed on a single silicon chip [Ref. 48], requires 
little mounting surface area. The number of external connections 
needed, however, can be large. Hence, in the latter case a pin limi- 
tation exists. In between these extremes fall the integrated circuit 
(IC) and medium scale integration (MSI) technological levels which may 
be area or pin limited for specific modules. The size of the modular 
partition chosen for the parent computer and the predominantly IC/MSI 
technology utilized will be seen to have a significant effect on BIT 
design. 

Partitioning of the parent computer was not otherwise speci- 
fied, except that the computer's basic design was to be readily 
adaptable for differing word length applications (specifically, 
multiples of eight bits, up to a 32-bit word length) without major 
redesign of the original modules. The expected initial apnlication 
of the parent comouter specified a 24-bit word length; this word length 
will be used in the example design. 
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The parent computer was to be essentially non-redundant; that 
is, no general replication of hardware at any level was intended. This 
constraint arose from cost considerations. Penalties in the additional 
hardware cost, increased size and weight associated with redundancy 
were deemed unacceptable. Additionally, the mean time between failures 
(MTBF) of the computer tends to be several times higher than the MTBF 

3 

of the equipment which the computer serves; e.g,, a radar. 

A closely related characteristic dictated ground repair of 
failures. No automatic reconfiguration under failure or fault-masking 
was intended, since such self-repair generally requires some redun- 
dancy. Airborne personnel to effect maintenance would not be available 
in the type aircraft for which application was projected. Access, 
removal of shielding, and dust-free repair would be difficult airborne. 
Built-in test was therefore restricted to detection and isolation of 
faults, and was not intended to include a self-repair capability. 

The requirement for minimal storage capacity was again related 
to cost. Random access storage such as core memory is expensive in 
hardware, size, weight, and power requirements. No peripheral bulk 
storage devices such as drum, disc, or tape were to be available. The 
effect of these characteristics of the parent computer on the design 
of BIT is significant. The dedication of memory bit locations to 
storage of error detecting codes, such as narity or residue, is elim- 
inated from consideration because of the attendant reduction in word 



^Reference 34 shows MTBF's in the lOO's of hours for the P-lllA 
weapons system avionics equipments. MTBF's for airborne computers, 
as shown by marketing brochures, are typically in the lOOO's of hours. 
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lenath available to the flinht proaram. Increased word length is 
unacceptable because of the greater storage requirement and hiaher cost. 
Coding is a widely used technique for detecting data transfer errors. 

The storage of software self-test programs and data in core-memory is 
also virtually eliminated from the list of often-used test tools. The 
core memory, then, is reserved for the flight program and for operational 
use with negligible capacity available for BIT use. 

Any self-test capability is not allowed to degrade the real-time 
operational efficiency of the computer in speed or availability. The 
effect of this requirement is to prohibit the insertion of test hard- 
ware in operational propagation paths because of the delays thereby 
introduced. Additionally, any sequential, orogram-oriented test routines 
would have to be exercised on a time-shared basis with ongoing tactical 
operations in available short blocks of "idle" time. Such routines would 
therefore have to be interruptable without destroyina test efficacy 
so that the machine could be returned to operational computation immed- 
iately, whenever required. 

The overall computer program called for a short development 
schedule with low risk to the company. These constraints dictate the 
use of existing techniques and designs wherever feasible. No completely 
new technology could be developed within schedule requirements. Off- 
the-shelf hardware components would be primarily used because of the 
risks attendant in meeting a short schedule with components ootentially 
available from outside suppliers at production time but still under 
development during computer design. 
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C. BIT DESIGri OBJECTIVES 



With the aforementioned broad goals set forth and the constraints 
imposed on self-test design by the nature of the larger program more 
clearly defined, realistic BIT design objectives can be developed. The 
maintenance problem would be most significantly assisted if faults 
could be isolated to the plug-in card, or modular level. Sub-modular 
fault isolation, while desirable from the standpoint of higher echelon 
maintenance, does not contribute any more significantly to increased 
aircraft availability since the faulty module must be removed in either 
case. Conformal coating for environmental orotection applied to cir- 
cuitry within the module makes removal of sub-modular components a 
difficult and specialized task inanprooriate at the immediate squadron 
(1st echelon) level. Higher level isolation would require reolacement 
of large and more expensive units of the computer. Stocking of spare 
parts at the module level seems reasonable for the squadron shop both 
in the inventory costs involved and the volumes required. Of course, 
commonality among modules reduces the different types to be stocked and 
is desirable. These heuristic arguments can be quantized, but the views 
presented should suffice to intuitively support the decision to set 
fault isolation to the modular level as a BIT desiqn objective. 

Since no airborne repair, manual or automatic, is required, 
reporting of faults detected within specific modules completes the 
self-test task. A comoatible design objective, supporting the second 
and third broad goals related to pilot notification of failure and aid 
to prototype development, is to rapidly indicate the specific modular 
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location of existing faults to a central location for immediate use by 
the pilot and later use by maintenance personnel upon mission termin- 
ation. 

The BIT design objectives and major constraints can now be summa- 
rized. The BIT design should automatically detect failures in the 
computer and isolate them to the modular level airborne. The modular 
location of such failures should be rapidly reported to a central loca- 
tion. The design should be minimized as to cost, require negligible 
core memory storage, utilize no coding techniques requiring storage 
capacity, and inflict no operational degradation on the computer's soeed 
and availability. All this should be accomplished on short schedule 
and at low risk. While these objectives and constraints for a self- 
test design are imposing, they are not atypical of the requirements 
of a military airborne system. Just what constitutes the failure to 
be detected can now be examined. 
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III. THE NATURE OF FAILURE 



Since the objective of "fault detection" has been set, its meaning 
should be explained. This section will consider what constitutes a 
fault and will define several related terms. The literature is replete 
with descriptive terms such as catastrophic, intermittent, solid, 
transient, burst, marginal, multiple, insioient, minor, and gross, applied 

4 

to fault and the related terms failure, error and malfunction. The 
terms "fault," "failure," and "malfunction" will be used synonymously 
to mean a physical defect in equipment which causes that equipment to 
perform in an unsatisfactory manner. The substandard performance 
usually resulting from a fault will be termed an "error." Another way 
of stating this is to say that an error is an incorrect result. The 
terms "solid" and "intermittent" will be used to characterize the dura- 
tion of the error, and by inference, the failure causing the error. 

A solid error will refer to an error which results from a failure which 
persists; a solid error will consistently recur under the same equip- 
ment conditions. An intermittent error will be one which is of short or 
transient duration and is non-persistent; that is, an intermittent 
error does not consistently recur given the same conditions. The terms 
"catastrophic" and "transient" are often used to describe these two 
categories of error, but they will not generally be used herein. The 
idea of degrees of failure is introduced by such terms as marginal. 



A good discussion of some typical 
is found in Ref. Z4. 



terminology surrounding "failure" 
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single or multiole, minor or gross. The term "marginal" will be re- 
served to describe a category of testing. The terms "sinale" and 
"multiple" will refer to one failure or error, and to more than one 
failure or error, respectively. 

Erroneous results can arise from sources other than equipment 
failure. Programming inaccuracies and human operator mistakes will 
not be considered to be error within the scope of this investigation. 
Equipment failure leading to erroneous results represents the class of 
faults to be detected by the design test techniques. Inaccurate intra- 
computer data transmission, faults in logic, failures in core memory, 
and failed test circuitry are representative of faults within this 
class of interest. 

Certain types of equipment, generally termed "hard-core," serve 
the entire computer and must operate properly if the computer is to 
function at all. Examples of such equipment are main power supplies, 
clocking circuitry, cooling equipment and other mechanical components 
such as electromagnetic interference shielding. Faults in this hard- 
core equipment have been effectively identified by voltage/temperature 
sensing devices which continuously compare performance to preset toler- 
ances, and similar wel 1 -known techniques [Ref. 46]. Faults in the types 
of hard-core equipment described above will not be considered to be part 
of the BIT detection and isolation task as defined herein. The main 
thrust of this investigation will treat the less adequately resolved 
problems of identifying and locating all possible failures in the loaic 
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circuitry, storaqe, data transmission paths, checkinq hardware and 
other equipment which is not hard-core in the previous sense of provid- 

5 

inq "housekeepinq" and utility services. 

Faults are usually identified by detecttnq the resultant errors. 

If a fault does not produce erroneous results, its existence is of 
little immediate consequence. For example, a shorted transistor always 
causinq an output to be tn the low voltaqe level (the zero of positive 
loqic havinq the binary logical states one and zero) does not become 
siqnificant until the hi ah voltaqe level represents the proper output 
value. In other words, a stuck-at-zero failure is not important until 
the proper result should be a logical one. Conversely, as previously 
mentioned, all errors are not the result of equipment failure (e.g., 
operator mistakes), but some of these appear to be the result of equip- 
ment failure. Equipment failure modes should be examined to identify 
those of interest to the test design. 

Assuming transistor building blocks (discrete, IC, MSI, or LSI 
technology) for the example computer loaic (vice cryogenics or some 
other technology), some of the possible failure modes are: 

1. Inputs or outputs stuck at the hiah or low voltage levels 
(stuck-at-one, stuck-at-zero). Innuts stuck above the high 
level or below the low level, a possible condition in some 
computers, have the same effect. 

2. Inputs or outputs stuck at an indeterminate, intermediate 
level between the high and low voltaqe levels. Indeterminate 
voltage levels miaht sometimes be interpreted as a one, and 
sometimes as a zero. 



The term "hard-core" will later also be applied to some equipment 
within this nroun subject to test, but in a different sense. 
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3. Deteriorated comoonent response to inputs or weakened drive 
capability of outputs. 

The first failure mode is the one of greatest interest for the 
subject test design because such persistent failures result in solid 
errors susceptible to detection and isolation. 

The second failure mode, inputs or outouts stuck at an indeterminate 
voltage level, might lead to no error if prooerly interpreted, inter- 
mittent error if interpreted differently at different times, or solid 
error if consistently misinterpreted. An assumntion which is often 
made in deriving a diagnostic scheme is to disallow the second failure 
mode.^ Another way of stating this is to assume that logic fails to 
one of the two logic levels, one or zero, and not to some intermediate 
level. The assumption can be validated by setting a voltage threshold 
above which results will be interpreted as one logical state, and below 
which results will be interpreted as the other logical state. The 
assumption of disallowing the second failure mode will be made for the 
test design.^ 

The third failure mode could result in solid or intermittent 
errors depending on the consistency of the erroneous results and the 
duration. For example, a weak driving caoacity of an outnut feeding 
several subsequent inputs (fan out) could result in some incuts receiv- 
ing a zero and others a one. This would be a solid error if the same 



^or example, see Ref. 3l. 

^This assumption is occasionally not made. For examnle, one scheme 
which relies on circuitry which fails to a NULL state intermediate be- 
tween one and zero is described by Connolly and Schmitt [Ref. 8]. The 
assumption of failure to one or zero is far more common. 
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inputs always received the same signal under the given conditions. An 
intermittent error would result if, for example, a driven input received 
a logical one in one instance and a zero in another for the same driving 
output value. The third failure mode is considered part of the test 
problem. It will be discussed again under the topic of marginal testing 
in Section IV. 

Intermittent errors should be discussed more fully, as they are 
sometimes part of the test problem and sometimes not. Some physical 
causes of intermittent errors are: 

1. Dirty connectors - a small smudge of oil or dirt on a pin might 
be sufficient to intermittently block the low current levels 
typically found in intermodular lines. Vibration can provide 
slight shifts in the contact surfaces sufficient to make or 
break contact. 

2. Temporary overheating of hardware regions - when not persistent, 
such transient environmental conditions can cause intermittent 
erroneous results. 

3. Loose connections or particles between circuits or within 
hardware oackages - vibration can cause open and closed circuit 
conditions intermittently. 

4. Unusual electromagnetic interference (EMI) or coupling-spikes 
coupled into the circuitry from outside, or appearing through 
the power supply can cause changes tn state resulting 

in erroneous performance. 

5. Drifting characteristics - aging or deteriorating comoonents 
or changing environmental conditions can cause varying and 
inconsistent performance changes in circuitry. 

While the above list is certainly not complete, it does serve to 
illustrate the many sources of intermittent error, and to suggest the 
difficulty of detecting and isolating the causes of such errors. Those 
causes not representing hardware failure, such as dirty connectors or 
unusual EMI, can cause erroneous results which falsely indite fault- 
free circuitry (which, when faulty, exhibits the same symotoms). Such 
causes of faulty performance are important because even one state 
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change affecting a logical decision within the machine can produce 
catastrophic results. While intermittent errors caused by other than 
hardware failure have been excluded from the test problem, test pro- 
cedures must endeavor to ensure they are, in fact, excluded. A proce- 
dure which signals hardware failure when none exists not only reduces 
the level of confidence accorded error signals, but also increases 
cost, in direct opposition to BIT objectives, by causing fault-free 
circuitry to be replaced. 

The degree, or extent, of failure is also important to test design. 
Single failures are inherently easier to detect and isolate than 
multiple failures; the detection problem is smaller. Additionally, 
multiple solid failures can have the property of occasionally masking 
each other, giving the appearance of intermittent single failure. To 
reduce the test problem to reasonable limits, the assumption that there 
exists at most a single failure in a computer to be tested is often 
made. The validity of the "single failure assumption" will be examined 
relative to the example BIT design as a possible means of reducing the 
quantity of added hardware required to give sufficient test effectiveness 
within acceptable program bounds. 

The components used in modern military/space systems are designed 
to have high individual component reliability. Low power silicon 
transistors in the Raytheon equipment used in Apollo and Polaris oro- 

_5 

grams, for example, were found to have a failure rate of 1 .4 x 10 
failures/1000 hours [Ref. 40]. If multicomponent packages such as IC's 
are used, the interconnections between components on the same silicon 
chip are more reliable than in the discrete component case. Overall 
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equipment reliability can therefore be expected to go un through the 
use of integrated circuits [Ref. 29]. Figures provided from a variety 

of aerospace suppliers 1964 to 1966 show failure rates for integrated 

-5 -4 

circuits from 7 x 10 failures/1000 hours to 5.2 x 10 failures/ 

1000 hours [Ref. 19]. Brauer has reported integrated circuit failure 

rates varying from 7 x 10' failures/1000 hours to 6 x 10'"^ failures/ 

1000 hours [Ref. 4]. Infant mortality failures and adolescent failures, 

usually occurring during burn-in and testing at the factory, exceed 

the exponential failures (constant failure rate) more common in an 

operationally deployed unit. This partially accounts for the diversity 

in the cited failure rates, and emphasizes the need to know failure 

rate sources and conditions for proper interpretation. The point to 

be made is that even the most pessimistic of the cited figures shows 

that a long operating life can be expected from modern components. 

The MTBF of a computer considers all the different component 
failure rates in addition to connection reliabilities and workmanship 
flaws in assigning a commonly used overall reliability figure of merit. 
The MTBF of the digital airborne computer can be expected to be in the 

O 

lOOO's of hours. With system MTBF's of this order of magnitude, the 
probability of experiencing one failure in a short time interval is 
very small. Experiencing two or more failures in the same short time 
interval is highly improbable. It then seems reasonable that one incurs 
a very small risk of undetected error if one designs test techniques 



g 

The Autonetics D26J airborne computer with an estimated MTBF of 
18,000 hrs; the Litton LC-728, 4,250 hrs; the Raytheon R-11, 3,500 hrs; 
the CDC 5400, 2,500 hrs are examples from marketing brochures. 
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assuming single failure, as long as testing is done at least neriodically 
at short intervals. This intuitive approach is used, as more exact 
calculations are dependent on actual failure rates, numbers and types of 
components, specified confidence levels and assumed distributions. The 
single failure assumption seems to be justified for the example design, 
and will be made. Restated, the assumption asserts that the computer 
is constructed of highly reliable individual comnonents so that essen- 
tially simultaneous failure of more than one component is so improbable 
that it can be reasonably neglected. The assumption is further 
justified economically by program limitations in that testing for 
multiple failures requires more added hardware at an unacceotable cost 
penalty. 

The foregoing examination of the nature of failure has led to some 
assumptions and conclusions relative to BIT design. First of all, 
logic will be assumed to fail to one of its two logic states, and not 
to some intermediate level. Solid failures will be of major interest; 
however, any failure leading to erroneous results is part of the detec- 
tion and isolation problem. Intermittent errors will be especially 
difficult to detect and isolate. Those erroneous results caused by 
non-hardware sources are important in that care must be taken to avoid 
condemning fault-free hardware as their source. Finally, the single 
error assumption will be made because little risk of undetected error 
is thereby incurred, and it presents the most reasonable annroach from 
an economic standpoint. Now the possible test procedures available 
to meet the BIT objectives can be considered. 
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IV. TEST PROCEDURE ALTERNATIVES 



A. GENERAL CONSIDERATIONS 

Comparison forms the basis of all test procedures. A norm against 
which comparison can be made must be available, either a priori or as 
a result of some generating process. The computer then produces a 
result which is suspect until verified against the norm. The variety 
of procedures available for testing a computer have this comparative 
process in common. 

Since thorough testing for all possible errors within the test area 
of interest is the objective, the different levels at which testing can 
be conducted should be identified. The computer can be functionally 
exercised by directing it to perform the operations for which it was 
designed on a variety of operands. The thoroughness of test can be 
evaluated by asking how many of the possible machine states are thereby 
verified. The totality of the possible combinations of inputs and 
outputs of the machine's logic circuits form the set of machine states. 

A gross functional check performed by exercising the computer's instruc- 
tion set on a few operands can be seen to be less efficient and comnlete 
in verifying proper operation of all circuitry than comprehensive 
application of the set of inputs with comparison of resulting outputs 
aqainst the set of unfailed machine output states. The one test method 
is superficial while the other is unnecessarily exhaustive. Each has 
been termed "100% testing" by industrial marketeers. The percentage 
of testing for this investigation will refer to the oercentage of 
possible errors for which checking has been performed. The former method 
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mentioned above would nrobably yield a low nercentane while the latter 
would represent testinq in excess of 100%. The closer to the Ionic 
level that testinn is directed, the more thorounh testinn becomes. Se- 
lective testinn at the Ionic level can be most efficient in identifvino 
all the failures of interest. 

Not only must test procedures check for all possible failures of 
interest, thev must also take care to avoid siqnallinn error when none 
exists, as alluded to in Section III in the case of non-hardware-caused 
intermittent error. Testinn which is not thorounh leads to invalidation 
of the sinnle failure assumption since some failures can no undetected. 
On the other hand, inappropriate error sinnals "cryinn wolf" can cause 
the pilot to take unnecessary abnormal action detrimental to mission 
completion. A sionificant advantane to testinn conducted in the air- 
borne environment is that not all errors identified airborne would be 
found if nround testinn procedures were used instead. Consequently, 
nround maintenance personnel must have a hinh denree of confidence in 
airborne error indications since nround verification may be impossible. 
If a throwaway maintenance concent is in effect, oood modules minht be 
discarded because of inaccurate test results. 

Detection of error is only one nart of the test problem. Isolation 
of the causative failure is the other. Test procedures differ in their 
ability to provide fault isolation. Earlv test Procedures were desinned 
to produce isolation to the sinnle component level (if isolation was 
provided at all) since machines were constructed with discrete technol- 
ony. The multicomponent nackane of the hinher level technoloaies has 
made unnecessary such fine resolution procedures. For the example BIT 
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design, the modular level is the level of interest. One is not con- 
cerned where within the module a failure is located; whether or not the 
modular package as an entity is faulty is of primary interest. With 
these general comments as a background, the various ways of categorizing 
test procedures can be explored. 

B. PROCEDURE CATEGORIES 
1 . Normal vs. Marginal 

Diagnosis of existing solid errors should be the first order of 
business for any test procedure. Prediction of possible future failures 
would be a desirable supplement to the preceding tests to locate exist- 
ing errors. The former testing will be termed "normal" testing while 
the latter is called "marginal" testing. Normal testing will be the 
type pursued in the example test design. However, marginal testing 
conducted in conjunction with normal testing is generally valuable in 
furthering test objectives. 

Intermittent errors cause one of the biggest problems to the 
test designer. However, an intermittent failure causing inconsistent 
results can often be forced to become a solid failure with a resul- 
tant solid error manifestation through marginal testing technigues 
[Ref. 7]. Marginal testing tends to worsen the third failure mode 
discussed in Section III by further weakening already deteriorated 
components until they become solid failures of the more easily diag- 
nosed first failure mode. Marginal testing consists of overstressi ng 
components through the application of abnormal conditions to cause the 
weak ones to fail prematurely during test instead of later during 
normal operations. Stressing, for example, can consist of over or 
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under biasinq transistors by a certain percentage of rated values. The 
danger of marginal testing is that existing intermittent failure can be 
masked by a rash of new failures should stressing be done carelessly 
or to needless extremes [ Ref. 3]. When done carefully, however, mar- 
ginal testing, in effect, oredicts future failures by forcina them to 
occur at non-critical times. It also serves to identify and rid the 
machine of bothersome intermittent failure, thusly increasing the 
degree of confidence accorded to airborne test results. 

Marginal testing is generally not appropriate airborne because 
of the time and extra equipment necessary to accomplish it. The 
accomplishment of marginal testing on the ground depends upon the 
maintenance concept. If periodic maintenance on the ground supplements 
airborne built-in testing, marginal testing should be part of this 
periodic procedure. In the example design, where no airborne repair is 
done, marginal testing can be accomplished whenever the computer is 
removed from the aircraft for repair of a solid failure identified by 
BIT. 

2. Software vs. Hardware 

Software testing refers to program-oriented, sequential Iv 
executed, periodic testing. The computer is directed by a program to 
accomplish a series of operations on supplied data. The results of 
these operations are then interpreted to provide diagnostic informa- 
tion. Since software testing is program-oriented, the level of testing 
(and therefore, to a certain extent, the efficiency of testing) is 
determined by the level of the programming language used. The lower the 
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order of the programming language, the closer to the component level 
operations can be specified. Assembly language or its equivalent is 
most frequently used. 

A programmed test routine is sequentially executed, one instruc- 
tion after the next. The length of the program in number of instructions, 
the cycle time of the storage device containing the program, and the 
execution times of the instructions affect the time duration of the 
test. Test results can usually only be determined after a sequence of 
instructions has been executed and a result determined. This result 
is then compared against some previously calculated correct result to 
see if error has occurred. The same sequence of instructions might 
then be repeated with a different set of data and a different expected 
result. Comparison against the norm can take place automatically 
under program control after short sequences have been executed, or 
later upon examination of a printed output. 

Procedures for software testing differ widely. The detection 
and isolation functions can be accomplished concurrently or separately. 

In the separate case, an "executive" routine might be run periodically 
to determine in a gross sense whether or not the computer were exhibit- 
ing abnormal behavior. Once such behavior were sensed, a more detailed 
"diagnostic" routine might be run to determine the more exact location 
of the failure causing the error. Because of the limitations of the 
proqramminq language in closely manipulating suspicious components, 
results might localize the failure to a region of the machine. Techni- 
cians would then locate the failure by hand probing. Such procedures 
tend to be inefficient, marginally effective, and always time- 
consuming. 
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The characteristics of software testing can be evaluated with 
regard to the BIT objectives. A definite advantage is that software 
testing requires little added hardware (other than storage) to accomnlish 
the checking function. Isolation after detection is difficult because 
of the periodic nature of testing. The test program typically occunies 
core memory (unless slower peripherals are available for temoorary 
storage) and requires significant running time if many different test 
data are to be used in an attemot to make testing more comorehensive. 

Some functional degradation would occur when time is scarce, even 
when the test orogram is run on a periodic basis, because testing must 
share available time with the operational flight program execution. 

On the other hand, the shorter the test program and the lonaer the inter- 
val between tests, the greater the danger of using erroneous results 
of undetected failure and downgrading test efficacy by invalidating the 
single failure assumption. Test results are only known after several 
ooerations have been executed. This presupposes that the machine has 
not failed to the extent that it cannot execute instructions and give 
results necessary to locate the failure. Intermittent failure would 
tend not to be detected by software testing, eliminating the problem 
of signalling error and indicating failure when none exists. On 
balance, software testing did not look generally attractive for the 
example design. 

Hardware testing refers to checking accomplished by added 
circuitry. Such testing is characterized by simultaneous detection 
and isolation usually at the logic level, rapidly available results, 
and minimal degradation of operational capability. In general, the 
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added checking hardware generates a basis for comparison with concurrent- 
ly generated flight program results, and actually accomplishes the com- 
parison at the logic level. Operation at the logic level provides 
excellent fault isolation capability. Results of the comparison are 
known essentially immediately. If a fault exists, it can be located and 
appropriate action taken prior to contamination of other data, or 
utilization of erroneous results. Hardware testing differs from soft- 
ware testing in that it checks the correct operation of the circuit 
being tested, but does not verify the correctness of the data being 
operated upon. The effect is that each circuit in a chain must be so 
checked if resultant data is to be certified. Further discussion of 
concurrent testing, characteristic of hardware testing, will.be pre- 
sented in the next subsection. 

By virtue of consisting of fewer components, checking circuitry 
is inherently more reliable as a whole than the logic it checks. How- 
ever, the components themselves are just as subject to failure as the 
components they test. To provide a high confidence of valid testing, 
therefore, one must consider the added test hardware itself as a poten- 
tial source of failure. Such hardware then becomes hard-core in the 
sense that its proper functioning must be verified before testing 
commences. Unlike the hard-core housekeeping and service hardware 
previously mentioned, checking hardware was considered to be part 
of the test problem. 

Hardware testing offered many benefits making it attractive 
as a means of meeting the example design objectives within program 
constraints. Its obvious disadvantage relative to software test was 
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the much higher cost oenalty incurred as a result of the exoense of 
added hardware. A combination of hardware test to orovide efficient 
test performance and software test to reduce expense offered a oossible 
tradeoff for the example design. 

3. Continuous vs. Periodic 

Testing can be classified by its duration as either continuous 
or periodic. Continuous testing must also be concurrent (the results 
of test may be somewhat time-skewed) since ongoing onerational comou- 
tations occur simultaneously. Continuous testing is characteristic of 
hardware test. The effectively immediate failure detection provided 
by continuous testing tends to identify intermittent errors, where 
periodic testing does not. The single failure assumption is justified 
since failures are detected as soon as they occur. Ooerations can be 
halted upon occurrence of an error and the machine state at time of 
halt preserved. The process of "retry" or "restart" then attempts the 
last oneration again to see if the same error recurs. Recurrence 
indicates a solid error and failure is flagged. Non-recurrence denotes 
an intermittent error, in which case the second correct attempt is used 
and operation continued. By noting the recurrence rate of intermittent 
error under the same conditions, intermittent hardware failure can often 
be distinguished from one-shot external sources. Hard-core house- 
keeping and service hardware is generally continuously tested. 

Periodic test refers to checking conducted at soecific intervals, 
such as software testing. The testing then time-shares with operational 
comoutation. Results are only determined after a number of sequential 
steps have been accomplished. Preservation of machine status for retry 
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when oeriodic testing detects an error is generally not practical. 
However, if the periodicity of test is sufficiently brief, error halt 
can occur shortly after failure, minimizinq the cumulative effect of 
error on oost-failure computation. The single failure assumotion is 
still valid if the period between tests is short. Intermittent errors 
will not be detected by periodic testing until they become solid. Even 
in a continuously tested machine, hard-core checking circuitry is more 
reasonably tested oeriodical ly . 

The unique nature of the added checking hardware providing 
continuous concurrent testing to the different logic circuits of the 
machine results in high cost. A tradeoff in favor of a periodic, 
interruptable test orocedure exercised at frequent intervals anpeared 
attractive for the example design. 

4. Deterministic vs. Non-Deterministic 

A deterministic test yields a definite answer to the question 
of whether or not an error exists. A non-determini Stic test yields 
results which are interpreted statistically against an expected dis- 
tribution to determine the orobability of the existence of error. The 
terms are more often applied in relation to software testing procedures 
since hardware testing is always deterministic. Non-determini stic 
testing was not attractive for the examole BIT design because of the 
requirement for a high degree of confidence in test results. Sta- 
tistical techniques were, however, found useful in selecting initial- 
izing data. 

5. Combinatorial vs. Sequential 

Seshu and Freeman [Ref. 45] classify the organization of 
testing into two different categories, combinatorial and sequential. 



34 



A combinatorial testing procedure involves application of a fixed set 
of inputs to the machine with the outnut results being analyzed to 
identify failures. As an examnle, non-deterministic testing is combina- 
torial. A seguential procedure has no fixed set of tests which are 
applied. The result of the first test seguence determines which test 
sequence will be used next. Sequential testing is more efficient since 
selection leads to fewer tests. These two categories should not be 
confused with the often used classification of logic as combinatorial 
(combinational) or sequential. Combinatorial and sequential testina 
procedures clearly refer to classes of software testing and not to 
concurrent hardware test. 

C. ALTERNATIVES 
1 . General 

The previous section presented several categories which can be 
used to describe test procedures. In practice, the specific proce- 
dures presented in the literature tend to fall simultaneously into 
several of the categories previously mentioned; all are a blend of 
alternate approaches having favorable characteristics relative to their 
intended applications. The discussion of specific alternatives re- 
quires a further cataloging effort, difficult because of the diversity 
of approaches to test and because of the aforementioned overlapping of 
categories. The discussion presented is not intended to be comprehen- 
sive; it is meant to demonstrate the diversity existing in the test 
field and to introduce some techniques which proved useful in devel- 
oping the specific blend of approaches best meeting the requirements of 
the example desinn. 
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Since most of the test alternatives identified have been 



oresented in the literature, the discussions are usually short, rapidly 
settling to a single level of interest. Some discuss the systems 
approach, giving overall techniques for testing the computer's different 
major units. Others have developed schemes for determining the optimal 
test sequences for checking one unit of the computer (e.g., the arith- 
metic unit, or the memory). Such schemes examine the states of the 
elements comprising the unit under consideration, the elements being 
identified as either fault-free or failed, and develop tests to yield 
the final diagnostic results on the entire unit. Still other techniques 
examine the states of the inputs and outputs of a single logic element, 
or block of elements (e.g., an AND gate or a multiplier block), with 
the goal of locating a failed element. The presentation of alterna- 
tives below will generally move from the system level to the loqic- 
block level; however, the typing is loosely defined and often diffi- 
cult. 

2. Coding 

A large variety of schemes and a significant body of theory have 
been developed in the literature relative to coding test techniques. 
Generally, coding represents a succinct way of supplying redundant 
information to provide a norm for comparison. Codes can be used to 
detect and correct single or multiple errors. The program constraints 
imposed on the example BIT design eliminate from consideration error- 
correcting codes and those requiring core memory storage. For this 
reason, only parity was considered potentially applicable for the 
example design. Its nature and possible use will be discussed next. 
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Parity is the simplest error-detecting code consisting of one 
redundant bit of information, making the sum of the information bits 
dIus the parity bit either even or odd as desired. For a binary number 

N = % 

where a^ is the binary value for the ith bit location, oarity P(N) can 
be expressed as 

'’evenC^' = ^ ^ 

and 

p„dd f"' ' ,f, "i ■* ' 2 

The correct parity value for a data word is known a priori. Upon 
completion of an operation, the correct parity of the result is known 
and is generally attached to the result as an additional bit. The 
actual parity is then calculated and compared to the expected parity to 
determine whether or not error has occurred. 

Parity has the capability of detecting odd numbers of errors, 
and therefore provides protection beyond the single error assumed. In 
the absence of the single error assumption, the risk of undetected multi- 
ple even errors can be calculated. Given an n bit word 

N = anaoS-, a 

1 2 3 n 

resulting from operations, the binary value a^. of the ith bit position 
can have one of two states relative to failure (failure states): it 

is either correct or erroneous. The probability of undetected error P^^ 
is just the sum of the probabilities of multiple even errors. Assuming 
an instantaneous probability of error p in bit location i and independence 
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between bit locations, the probability of k simultaneous errors is 
just p . Accounting for all combinations of ways k errors can occur 
in an n-bit word length (n even), the instantaneous probability of unde- 
tected error can be expressed as 

* ... * 

(|;)p"o-p)'’ 

= z ( 9 t,)p^^(l-p)^~^'^ where m = n/2 
k=l 

For n = 24 and p = 10"^ 

*^ue ” 2,7 X 10"^, or .027%, a very low risk. 

Parity can be useful in both software and hardware test pro- 
cedures. It is often used to detect single errors in data transmissions. 
For the example design its potential use was as a hardware test where 
the correct oarity was automatically present, or generated by the cir- 
cuitry to be checked. A hardware parity generator and comparator could 
then be added to orovide error indication. An examole aoolication 
might be to a feedback shift register which always generates a number 
with odd parity to which a parity generator and comparator could be 
added to verify proper operation. The generation and use of oarity for 
comparison was only acceptable for the examole design where core memory 
storage of parity bits was not reguired. 

3. Diagnostic Partitioning 

The general technigue of diagnostic partitioning divides the 
computer into smaller entities, each of which can then be tested 
separately. Forbes, Rutherford, and Steiglitz [Ref. 13] present such 
a technigue in which the computer is partitioned into "diagnostic 
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subsystems," each having certain canabil ities . The subsystem essen- 
tially is able to apply stimuli, sequentially execute a series of oper- 
ations, receive and process inputs, and communicate diagnostic results 
of test to the outside world. , The subsystems can then alternately 
diagnose each other. A sequence for system diagnosis at the subsystem 
level is developed. Their technique of partitioning a machine into 
essentially autonomous sections was found to be applicable in the exam- 
ple design. The test technique involves a periodic, software test with 
fault isolation provided by the order of operations. An interesting 
feature is the microprogramming of the test routine to provide closer 
manipulation of the logic for the reasons previously described in 
Section IV-A. 

The concept of diagnostic partitioning can be applied to a 
partitionable machine in a "bootstrap" fashion. One subsection is 
considered to be hard-core, and it is checked by hardware means, 
manually, or by software. An example of software test would be execu- 
tion of a small number of operations requiring only the hard-core sub- 
section to implement. Upon verification of the hard-core subsection, 
one then uses it to check the next subsection. The two checked sub- 
sections can then be used to check the next and so forth. This repre- 
sents a type of sequential testing (vice combinatorial) at the 
subsystem level. Manning [Refs. 31 and 32] describes a modification 
of such a technique. The difficulty with diagnostic partitioning is 
that the architectural designs of many computers do not facilitate 
partitioning. 
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4. Program Hierarchy Testing 

A system technique related to diagnostic partitioning examines 
the functional capabilities of the computer. A hierarchy of distinct 
software programs is used to functionally partition the machine, in 
contrast to the physical sectioning associated with the diagnostic 
partitioning of the previous section. A high level program oeriodically 
functionally tests the computer by exercising short routines using the 
machine instructions to grossly check the computer for proper oper- 
ation. Examples of functional checks might be adding, multiplying or 
shifting. Such "executive programs" are not intended to be comprehen- 
sive or isolating; they detect errors in functions by comoaring results 
obtained to previously stored expected results. Once an error has been 
identified, a "diagnostic routine" tailored to the type of functional 
error detected is executed to provide the isolation required for repair. 
While not comprehensive, such a technique allows frequent running of 
the short executive routine, while calling on the longer diagnostic 
routine only when error is sensed. Cohen and Whitaker [Ref. 7] describe 
such a procedure developed at Sylvania. Bashkow, Friets, and Karson 
[Ref. 3] divide the diagnostic process by hierarchy into a command 
checkout ohase, used to assure that the machine is "breathing" (no 
gross malfunctions exist), and "executive", "testing", and "diag- 
nostic" phases to give more detailed checking at lower levels. The 
diagnostic programs used are microprogrammed to provide failure 
resolving capability. 

5. Software Exercise, Hardware Detection 

An interesting combination of testing techniques uses software 
routines to exercise the computer periodically and added hardware 
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circuitry to detect errors. The hardware provides the level of detec- 
tion resolution required. Software routines need only thoroughly exer- 
cise the machine, with no attention to order of execution for isolation 
being necessary. Fred Lee [ Ref. 27] describes such a orocedure in 
which the machine's operations are broken down into sequences of events, 
recognizable as pulses occurring in a specific order. The correct 
sequence is provided for the test routine and is compared against the 
actual sequence. Hardware monitoring devices provide the comparative 
function with non-coincidence signalling specific error. With an 18.2% 
increase in transistor count for test purooses, Lee claims 100% confi- 
dence in the device. This procedure is also described by Sellers, 

Hsiao and Bearnson [Ref. 43] under the title of "sequential logic 
latch checking." While Lee's procedure was not used, the idea of 
software exercising and hardware detection was of use for the examole 
design. 

6. The Black-Box Approach 

The black-box approach refers to the process of setting the 
inputs of a network and observing the resultant outputs, useful in- 
formation thereby being derived without internal access to the net- 
work. A most extensive body of literature reports on varying 
schemes to obtain optimal, minimal sets of inputs to diagnose all 
possible errors internal to the network. With the growing use of 
multicomponent packages inaccessible internally (IC, MSI, and LSI 
technology), this test area has received renewed attention. Eldred 
[Ref. 12], in one of the earlier oapers treating the black-box 
approach, discussed the derivation of minimal tests for a simole 
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network of discrete components by evaluating the inout conditions which 
should cause the network output to be "activated" or "inhibited." 

Results deviating from this norm indicated failure. Armstrong [ Ref. 1] 
presented a procedure based on "path sensitizing" in which a given 
internal fault is selected and its effect is traced to the outout for 
given input conditions. The procedure continues until all faults have 
been treated and the significant input and outout patterns derived. 

The "truth table" or fault dictionary technique is similar in that a 
table of the expected outputs for given inputs and specified internal 
failures is derived. Comoarison of combinatorial test results to the 
fault dictionary determines if an error has occurred, and where. 

The derivation procedure for a large block of logic can be 
tedious, even when computer aid is used. The requirements for memory 
can easily exceed availability in the analysis of large networks. Such 
difficulties have led to the development of simplifying methods for 
automating the analysis of large networks. There is wide agreement in 
the literature that the derivation of minimal input tests for a large 
block of logic must be automated. 

Sellers, Hsiao and Bearnson [Ref. 42] developed an algebraic 
technique based on Boolean difference to facilitate learning the effect 
of a change in state of a chosen input on the network output. The 
procedure involves logically Exclusi ve-ORing the Boolean outout func- 
tion, expressed in terms of the inputs, with the same function having 
the chosen input inverted. If the Boolean output function is 
F (x-|, X2» •••» •••» 

where x^ are the inputs, for the system 
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they define the Boolean difference as 

F(^1» ^2* •••» •••» V ^2’ •••> 

where the chosen input of interest is inverted in the second ex- 
pression andV" represents the Exclusive-OR ooerator. The Boolean 
difference yields the input conditions for which the outout will change 
state, given the chosen input state change. 

Roth [Ref. 41] with his calculus of D-cubes exoands on the 
above method, but with a more graphical technique to solve the some- 
times formidable problem of accomplishing algebraic operations such 
asA/" for complex functions. He first expresses the truth table of each 
element of the network in a succinct form and then gives rules for 
intersecting the tables of the individual elements to form the table 
describing the entire network. 

The usefulness of such techniques is reported by Galey, Norby 
and Roth [Ref. 14] in an earlier version of Roth's later technique. 

Four eight-bit input tests were automatically derived, the results 
of which would indicate whether any one of 102 possible internal 
failures had occurred (but not which one). This illustrates the con- 
cept of testing an internally inaccessible network for failure 
without interest in which specific component has failed. 
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An interesting contrast is offered by Maling and Allen [Ref. 30] 
who test a network for failure with the purpose of identifying the 
specific failed component. For each n-input component of the logic 
net, 2 ^ represents the number of different input combinations. Only n 
+ 1 of these are necessary to show that each inout in turn can control 
the output and that the output can take either state. For a net of k 
such components where the ith component has n^ inputs, they state that 
the number of configurations C of the n + 1 required inputs per compon- 
ent is 

k 

C = k + E n. 

i=l ^ 



This number also represents the maximum number of tests required to 
thoroughly check the circuit with component isolation. The lower 
bound is determined if each test is efficient enough to eliminate half 
the components from further consideration. The minimum number of tests 

is then 
mi n 



Lin = I + Hog? c| 
mm ' 2 ' 

where | | indicates next higher integer. From experience, they state 
that the number of tests required is usually approximately equal to the 
number of components. 

7. Non-Dupl icati ve Hardware Checking 

Checking by adding hardware which does not duplicate the cir- 
cuitry being checked provides the benefits of hardware test without 
the cost of duplication. Rao [Ref. 39] describes a method for checking 
arithmetic-type operations in a processor through the use of residue 
coding generated and employed by added hardware without storage to 
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identify errors but not to locate them. The residue code was used to 
provide a high level of multiple-error checking capability not required 
in the example design. The 1000 gate processor required 400 added gates 
to check it, or a 40% increase in cost which would be unacceptably high 
for the examole design. Sellers, Hsiao and Bearnson have comniled a 
comprehensive volume [Ref. 43] on error detecting logic, which is the 
only one of its kind identified by the author. The cited reference 
is an excellent source of non-dupl icative hardware checking schemes. 

The use of non-duol icative hardware schemes anoeared attractive for 
the example design, particularly for the hard-core circuitry included 
in the test problem. 

8. Replication and Comparison 

When other schemes do not provide adequate checking, one can 
replicate circuitry, operate the replicated portions in parallel and 
compare the results, with any non-coincidence indicating error. While 
the technique is expensive (and unacceptable for the example design) 
when employed on a large scale, it often presents the only technique 
by which isolated small blocks of circuitry, or highly irregular cir- 
cuitry can be thoroughly checked. For the examole design, duplication 
of small sections was very useful. The replicate and comoare concept 
is often applied when high reliability requirements force the use of 
redundant hardware on a large scale. Switching to the unfailed dupli- 
cate offers continued operation while the failed portion is renaired. 
Automatic repair is not appropriate to this investigation, yet it 
proceeds naturally from some of the methods found useful and there- 
fore represents a good topic for further related investigation 
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An interesting contrast is offered by Maling and Allen [Ref. 30] 
who test a network for failure with the purpose of identifying the 
specific failed component. For each n-input component of the logic 
net, 2*^ represents the number of different input combinations. Only n 
+ 1 of these are necessary to show that each input in turn can control 
the output and that the output can take either state. For a net of k 
such components where the ith component has n^. inputs, they state that 
the number of configurations C of the n + 1 required inputs per compon- 
ent is 

k 

C = k + E n. 

i=l ^ 



This number also represents the maximum number of tests required to 
thoroughly check the circuit with component isolation. The lower 
bound is determined if each test is efficient enough to eliminate half 
the components from further consideration. The minimum number of tests 

is then 
mi n 



= 1 MogjCl 

where | | indicates next higher integer. From experience, they state 
that the number of tests required is usually aoproximately equal to the 
number of components. 

7. Non-Dupl icati ve Hardware Checking 

Checking by adding hardware which does not duplicate the cir- 
cuitry being checked provides the benefits of hardware test without 
the cost of duplication. Rao [Ref. 39] describes a method for checking 
arithmetic-tyoe operations in a processor through the use of residue 
coding generated and employed by added hardware without storage to 
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identify errors but not to locate them. The residue code was used to 
provide a high level of multiple-error checking capability not required 
in the example design. The 1000 gate processor required 400 added gates 
to check it, or a 40% increase in cost which would be unacceotably high 
for the example design. Sellers, Hsiao and Bearnson have comniled a 
comprehensive volume [Ref. 43] on error detecting logic, which is the 
only one of its kind identified by the author. The cited reference 
is an excellent source of non-dupl icati ve hardware checking schemes. 

The use of non-duol icati ve hardware schemes aoneared attractive for 
the example design, particularly for the hard-core circuitry included 
in the test problem. 

8. Replication and Comparison 

When other schemes do not provide adequate checking, one can 
replicate circuitry, operate the replicated portions in parallel and 
compare the results, with any non-coincidence indicating error. While 
the technique is expensive (and unacceptable for the example design) 
when employed on a large scale, it often presents the only technique 
by which isolated small blocks of circuitry, or highly irregular cir- 
cuitry can be thoroughly checked. For the examnle design, duplication 
of small sections was very useful. The replicate and comoare concent 
is often applied when high reliability requirements force the use of 
redundant hardware on a large scale. Switching to the unfailed dunli- 
cate offers continued operation while the failed portion is renaired. 
Automatic repair is not appropriate to this investigation, yet it 
proceeds naturally from some of the methods found useful and there- 
fore represents a good topic for further related investigation 
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by others. Duplication and comnarison, recognized as one of the most 
effective test techniques, formed the basis for a unique aoolication 
in the example design of the diagnostic oartitioning scheme described 
earl ier. 

9. Probabalistic Method 

A non-deterministic method which is periodic and combinatorial 
is presented by Merwin [Ref. 33]. A block of combinatorial logic 
(vice sequential logic having feedback paths, not to be confused with 
combinatorial test) having many inputs is tested by first establishing 
the expected distribution of output values. Each of the possible 
combinations of input values is considered equally likely. The output 
pattern resulting from each input pattern is derived. The statistical 
appearance of a given logical value at each specific outnut of the output 
set can then be determined. For example, if there are 16 possible input 
combinations (four inputs) and three outputs, output number two may 
have the value logical one for eight of the input combinations. The 
logical value one would then be expected 8/16 or 1/2 of the time at 
output number two. Merwin attaches a random number generator to the 
inputs and tabulates the incidence of appearance of the logical value 
one at each of the outputs. Deviation of the actual ratios from the 
expected ratios may signify an error. If output two took the value 
logical one only 1/16 of the the time instead of the expected 1/2 of 
the time, error would be likely. Decision criteria can be established 
using statistical procedures. The random number generator as a source 
of random bit patterns was useful in the example design. 
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V. THE EXAMPLE DESIGN 



A. THE TEST CONCEPT 

The parent computer was divided into units: 

1. The processor unit - containina arithmetic loaic and 
General purpose registers. 

2. The control unit - to provide control signals for direction 
of operations in the processor unit. 

3. The core memory unit - to provide storane of the flioht 
program and temporary data. 

4. The input/output (I/O) unit - to provide interface between 
the computer and the equipment it serves. 

The I/O unit will not be considered in the present investigation. 

The proposed instruction set for the computer (to be termed the 
macro-instruction set) provided for an extensive half-word/ half- 
register addressing and manipulation capability. Processina was to 
be possible on 24-bit words (full-word operations), on the right or 
left 12-bits of the 24-bit word separately (separate half-word opera- 
tions), or on the riaht and left 12-bits of the 24-bit word simultaneously 
(parallel half-word operations). 1-h'th little added hardware and desion 
effort, it appeared possible to configure the highly regular Ionic 
of the processor unit into two autonomous halves, each possessing multi- 
functional capabilities. This diannostic partition inn in effect 
provided a duplex redundant processor unit without the expense of 
duplication the hardware. This technique will be termed "split 
duplication." 

With the proposed hinh speed of the parent computer, sufficient 
time was available when the machine was not nerforminn its basic 
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operational mission (idle time) to time-share a periodically exercised 
test procedure without imposing any functional degradation. This would 
be particularly true if the test procedure, once initiated, could be 
interrupted to return the computer to operational computation without 
destroying test efficacy. Idle time was to be available every few 
seconds, validating the single failure assumption through short oerio- 
dicity of test. The lower cost advantage of periodic, program-oriented 
testing could be thereby enjoyed. 

Two modes of operation were identified. In "normal" mode operation, 
denoting mission operational computations, both halves of the computer 
would be used together, making full -word, separate half-word, or 
parallel half-word operations possible. In "test" mode, denoting idle- 
time test exercising, only parallel half-word operations would be 
possible. During test mode, the autonomous processor halves would be 
loaded with identical half-word bit patterns. Identical parallel ooer- 
ations would then be executed on the like data independently. Comparison 
of the results would then be accomplished with non-coincidence of the 
two halves indicating error. The advantages of the superior duplication 
and compare method could be enjoyed without the cost disadvantage of 
duplicated hardware. 

The source of data words with which to initialize the two processor 
halves during test mode remained to be resolved since core storaae was 
not acceptable. The possibility of using an inexpensive hardware 
pseudo-random number generator, similar to the one used in Merwin's 
probabilistic method, appeared to be an attractive option which was 
compatible with the concept of interruptable test while reguiring no 
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core storage. Random patterns would more nearly simulate inputs used 
during normal mode operation. An argument can be made for "worst-case" 
testing in which a small number of unusual bit patterns not normally 
encountered in normal mode operation are used to stress the machine in 
a worst-case manner. Such stressing appeared to be more appropriate 
for marginal testing on the ground when such worst-case patterns might 
be expected to hasten impending failure. Additionally, no "end-of- 
test" point needed to be identified since the machine was to revert to 
test mode at any time not reguired for normal mode operation. Finally, 
the storage reguired for worst-case bit patterns obviated their further 
consideration. 

The use of a pseudo-random number generator allowed the core memory 
unit to be disconnected from the processor unit during test mode, and 
made possible the core memory unit's separate checking either concur- 
rently, prior to, or subseguent to processor unit test. The control 
unit, however, was reguired in test mode to supply the control signals 
to direct the parallel half-word operations. Testing of the control 
unit itself, and the location and execution of the exercising test 
routine still needed resolution. 

The issuance of accurate control signals by the control unit to the 
processor unit is a prereguisite to correct computation. The control 
unit was to be microprogrammed using a read-only-memory (ROM) as the 
storage device. The control signals appropriate for executing the 
macro-instruction set were to be hard-wired in the form of short 
routines of the lower order micro-instructions. The hard-wiring 
consisted of arrays of transistors implemented on a small number of 
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silicon chips, the whole comprising the ROM. The remainder of the 
control unit consisted of the selection and sequencing circuitry required 
to assure issuance of the proper sianals in a timely manner. 

Because of the standard packaged arrays available with which to 
implement the ROM (the low risk nature of the program dictated use of 
off-the-shelf hardware), sufficient unusued storage capacity beyond the 
requirements for the microprogrammed control siqnals was present to allow 
storage of a microprogrammed test routine. Careful, efficient micro- 
programming of the test sequences promised a much shorter test routine 
requiring significantly less ROM storage than the comparable core 
memory storage needed for an equivalent routine programmed using the 
macro-instruction set. The inherent advantage of the lower order 
micro-instruction set relative to thorough exercise of the computer at 
the logic level is enjoyed by such a scheme. An additional significant 
advantage for an interruptable, time-shared test routine is the much 

9 

shorter cycle time of the read-only-memory compared to the core memory. 
Note should be made here that test mode exercise of the processor unit 
could be accomplished entirely independent of the core memory unit. 

Since the control unit was to issue the control sinnals directing 
the test routine, it became hard-core hardware whose proper function- 
ing had to be continuously assured. Hardware techniques for continuous, 
concurrent testing of the control unit were therefore essential to the 



% typical core memory cycle time is 2 usec while a typical ROM 
cycle time is 200 nsec, 10 times faster. 
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concurrent testing of the control unit were therefore essential to the 
test concept. As will become evident when the control unit BIT design 
is discussed, the highly irregular nature of control unit circuitry 
tends to necessitate hardware test technigues in any case. 

With the test concept developed, the more detailed BIT design of 
each unit can now be examined. 

B. THE PROCESSOR UNIT 

With the exception of the power supply, considered to be hard- 
core servicing hardware excluded from the test problem, no hard-core 
hardware reguiring continuous test was to be located in the processor 
unit. The split duplication, periodic technigue of testing the pro- 
cessor unit could be expected to thorouqhly check its operation. 

The contents of the general processor module resulting from 
partitioning the processor unit are shown in Figure 2. Figure 3a shows 
the 24-bit data oath divided into four-bit groups, with the double line 
denoting the left and right half-word division. Two four-bit qrouns 
L^. and R^ , are physically located on the same module, providing eight 
bits of the 24-bit wide data path. The remaining groups are likewise 
associated on separate modules, a total of three identical modules 
(see Figure 2) being necessary to implement a 24-bit oath. Modifi- 
cation of word length in eight bit increments is possible, in conson- 
ance with the objective of flexibility of word length. For examole, 
addition of a fourth identical module would easily convert the pro- 
cessor to a 32-bit path width. 

Emphasis should be placed on the fact that the description above 
refers to a data path, and not to a single register or a sinqle 
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functional circuit. The amount of hardware implemented on the 2A NAFI 
module is dictated by its area and pin limitations, discussed in Section 
II-B-2. The entire processor unit can then be thouaht of as consistina 
of a series of three-module sets, the modules within each set being 
identical. The total number of modules in the processor unit would be 
a multi ole of three. 

Providing isolation to the modular level has only been briefly 
discussed so far. Identical half-word bit patterns are used to initial- 
ize the processor circuitry being tested. While the computer is in 
test mode, these bit patterns undergo parallel operations concurrently 
in the autonomous halves. The results of such operations should there- 
fore be identical at each point in the data path . Any difference 
indicates that a fault exists. Non-coincidence is signalled by a 
hardware comparator placed in each module to compare the autonomous 
halves' results. The required fault detection and isolation are hence 
achieved by the placement of the comparators in the data path at the 
modular level. Comparison takes place continuously during test mode at 
each clock pulse, so interruption to return to normal mode operation 
has no effect on test efficacy. 

The decentralized power supply located in each module consisted 
of the final step of regulation reouired to provide the power level or 
levels necessary in the module. The decoding of the control signals 
was also accomplished in the associated module. Decode could thereby 
be checked by the same technique as other processor hardware, elimina- 
ting the necessity for the more difficult, costly continuous checking 
of decode circuitry located in the control unit. Any failure in 
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the power supoly serving the module or in the decoding function would 
occur jn_ the module. By tying the hard-core checking circuitry for 
testing the local power supply (not treated herein) into the processor 
module checking circuitry, a single error signal could be issued from 
the module in case of failure. For the examnle design, the reason for 
failure within the module did not need to be identified; only isolation 
to the modular level was required. If a centralized power supnly 
provided fine power regulation and if decode were located outside the 
module served, precautions would be necessary to insure that failures 
in these functions did not cause failure within the module to be errone- 
ously signalled. Confidence in the error signal once issued is 
increased by the decentralizing scheme described. 

In test mode, only parallel half-word operations are accomolished. 

In normal mode, however, full-word and separate half-word ooerations 
are also utilized. Differences in the execution of operations in the 
two modes had to be identified to ensure that test procedures thoroughly 
exercised the circuitry, and that test hardware did not degrade normal 
mode operation. The carry forward found in adders, shift registers and 
counters in the processor unit was the major such difference. 

Figures 3b and 3c show the carries associated with parallel half- 
word and with full-word operations, respectively. In the case of 
parallel half-word operations, the carries between adjacent four-bit 
groups in the two halves are identical. For example, the carries from 
L-| to I2 and from R-j to R 2 are the same. Since the L-j and R.j qrouns are 
located in the same module, the carries from the most significant ends 
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of L-| and R-j are identical when no fault exists. These carries can 
then be compared, with non-coincidence indicating a failure in that 
modul e . 

One difficulty arises during test mode parallel half-word ooerations 
when an error in a carry is detected; e.g., the carry from L-j to L 2 
differs from the one from R-j to R 2 . Error is signalled in the current 
module. The differing carries, however, cause the bit contents of L 2 
and R 2 in the next module to differ, and because they don ' t comoare, 
error is al so signalled in the next module. This difficulty can be 
resolved by inhibiting the error signal in module i+1 when an error 
signal is issued from module i preceding it. 

Another difficulty arises because during full -word operations in 
normal mode, the bit contents of the groups L^. and R^ in the same module 
may differ with no faults existing. Likewise, the carries propagated 
from L^. to •-i+l ’ and from R^ to » may also differ. The error 
signal due to non-coincidence must only be allowed in test mode, in 
which any non-coincidence is the result of failure. A test-enable 
signal can be applied to checking circuitry in test mode. 

It was also desirable to eliminate any gating from the inter-modular 
carry paths to avoid prooagation delays. Figure 4a shows the checker 
circuitry added to each module. Figures 4b and 4c show oossible looic 
implementations of the desired truth tables for the carry checker and 
error-inhibit respectively. Figure 5 shows the relationships between 
two adjacent modules. 

Mote should be made that the error inhibition in the case of the 
first difficulty discussed does not allow two adjacent modules to signal 
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error during the same periodic test iteration. Both carries from one 
module to another are also assumed not to fail simultaneously, in v/hich 
case the comparison check would be passed in spite of existing failures. 
Both of these cases are highly improbable and represent part of the unde- 
tected failure risk accepted under the single-failure assumption for 
built-in-test. In the case of simultaneous failures in adjacent modules, 
only one is signalled. However, upon checkout after repair or replace- 
ment of the signalling module, the second module would then immediately 
indicate failure. 

The fault-detecting circuitry described thus far does not distinguish 
between faults occurring in the module and faults occurring in the data 
transfer paths between that module and the previous one. Circuitry to 
provide such isolation could be added, and would consist of another 
comparator if the additional cost were acceptable. The problem of de- 
termining if the fault exists in the module or in the data transfer 
paths between that module and the preceding one would have to be accom- 
plished by ground maintenance personnel unless the additional comparator 
were incorporated. 

The pseudo-random number generator has been very briefly treated. 

Such a device is capable of providing long sequences of data words. 

A 12-bit generator was required for the example design. Golomb [Ref. 15] 

describes the design of a simple linear feedback shift register requiring 

very little hardware. An example generator which adequately fulfills the 

test requirements under consideration is included as Appendix A. The 

1 2 

maximum length sequence of 2 different patterns was obtained by 
implementing a modulo two irreducible polynomial found in Peterson 



55 



[Ref. 36] and adding the nonlinearity of the important all -zero case. 

The patterns so obtained met Golomb's tests of randomness in each bit 
location. A self-checking pseudo-random number generator design 
using more hardware is illustrated by Sellers, Hsiao, and Bearnson 
[Ref. 43] under the title of "unit distance code parity checked counters." 

C. THE CONTROL UNIT 

The control unit was the least regular of the units to be self- 
tested. Additionally, it was hard-core, requiring continuous test to 
validate the control signals issued to the processor from the ROM. The 
split duplication test concept of oeriodical ly exercising the processor 
unit during idle time presupposed a fault-free control unit able to 
issue appropriate control signals to direct test exercises whenever 
such idle time became available. Continuous testing of the control 
unit with added checking hardware would assure its fault-free avail- 
ability by signalling its unavailability upon occurrence of a failure. 
Partitioning the control unit to provide modular isolation of failure 
while minimizing the requirement for added hardware is the subject of 
this section. Since the control unit was the only unit requiring 
continuous test, it should be recognized that a large portion of the 
overall hardware penalty for providing BIT to the computer as a whole 
was to be paid in the control unit. 

Testing the control unit consisted of the following steps; 

1. Testing the ROM for correct word content 

2. Testing proper accessing of the ROM 

3. Testing proper sequencing of accesses 

4. Testing the checking hardware, which was also subject to 
failure . 
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Testing the checking hardware was a problem coimion to all the units, 
and it will be treated in Section E below. Figure 6 shows the non- 
parti tioned control unit organization for the parent comnuter. Figure 7 
illustrates the general modular partitioning and hardware added for 
checking, which is described below. 

Testing the ROM for proper word content will be examined first. 

The control signals used to properly execute the flight program (and 
the test routine) are stored in the ROM in the form of hard-wired bit 
patterns called microwords The contents of the microword can change 
under failure, having a catastrophic effect on the control unit's ability 
to issue proper signals and consequently on the computer's ability to 
execute the flight orogram. The ROM, exclusive of addressing hardware, 
will be assumed to be implemented in segments of 256 eight-bit words, 
shown in Figure 8, although this imolementation is not critical to 
the test procedures described. The ROM microword length will be 
assumed to be 48 bits, also not critical. The ROM is then implemented 
in six segments, as illustrated in Figure 8. 

Three fields of the microword format (see Fiqure 8) have test 
significance: 

1. Parity field (P) - one bit dedicated to parity of the entire 
microword in which it is located. 

2. Next address field (NA) - eight bits containing the next 
address in the microprogram sequence (the next microword to 
be executed). This field was necessary even without BIT. 



^^Depending on the method of microprogramming, a microword may include 
several micro-instructions to control simultaneous operations in 
the processor and elsewhere. A microprogram is executed one microword 
at a time. 
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3. Current address field (CA) - eight bits containing the address 
of the microword in which it is located. 

The six segments comprising the ROM are checked for correct word 
content by oarity. The addressing circuitrv accesses only one micro- 
word at a time. Parity is generated on the microword issued to the 
43-bit hold register. This generated parity is then compared to the 
proper parity stored in the parity field of the microword. Note should 
be made that the hold register and the ROM sense amnlifiers are also 
checked by this procedure. The functions of parity generation and 
comoarison are combined in the parity checker shown in Fiaure 7. 

The oartitioninq indicated shows all addressing and decoding cir- 
cuitry in a module separate from the ROM storaae seaments, sense 
amplifiers and hold register. Divorcing the circuitry functionally 
related to addressing in this manner allows fault isolation to the 
modular level. This technique eliminates the ambiguity as to the mod- 
ular location of failure when a nortion of the addressing function is 
implemented in the same module as the ROM storage segments (a good 
example is the address decode, often provided on the same MSI chin 
as the storaae devices). 

The single-failure assumption made for the examnle design contends 
that the probability of multiple simul taneous failures in systems 
composed of comnonents having inherent high component reliability is 
so small that oractical test design need not consider it. This assump- 
tion was justified for discrete comnonents and even for IC's, but 
with the advent of MSI and LSI with their numerous closely-packed 
comnonents, it must be reconsidered. In the context of the present 
subject, one must consider the higher orobabilitv of multiple failure 
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caused, for examnie, by a cracked silicon chin where several adjacent 
components would be simultaneously affected. Odd narity, for instance, 
will not detect multiple even failures. The use of parity for ROM 
content checking anoears to be justified by the fact that multiole 
failures would tend to affect more than one microword (to continue the 
example, a chin crack probably would not lie straight along the line 
of devices implementing a single microword). While one ROM access 
might not catch an even number of failures in one microword, very few 
subseguent accesses to different microwords would be npcessary before 
a single or multiole odd failure would be detected and signalled. So, 
while the single failure assumption can be guestioned for an MSI ROM 
implementation, the use of parity can still be justified. 

Testing the addressing functions of the ROM is accomplished by 
comparing the current address field (CA) of the microword with the step 
counter contents. The step counter (or a second register if timing 
reguires the step counter to change prior to the issuance of the micro- 
word being accessed) contains the address of the microword to which 
access is being attemoted. The eight bits of the CA field contain the 
address actual Iv accessed. Comparison of the two indicates whether an 
addressing failure has occurred. The sten counter, decode and drivers 
are implemented on the same module. Non-comparison of the CA field 
and the step counter therefore signals an error in this address-func- 
tion module. If the parity check in the ROM storage module fails, 
indicating incorrect microword content or a failed hold reaister, the 
error signal from the address function module is inhibited since the 
contents of the CA field being used for comnarison are now susnect. 
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Prooer sequencing of accesses to the ROM is the most difficult 
check to accomplish. A description of the sequencing nrocess in gen- 
eral terms gives insight to the oroblem. The microprogram contained 
in the ROM consists, in effect, of a series of "subroutines" in a 
lower level language (the micro-instruction set), one "subroutine" 
for each of the macro-instructions used to write the fliaht nronram 
stored in the core memory. The flight program instruction word's 
operation code field, representing the macro-instruction, is analo- 
gously used as the "call" statement for its "subroutine". Since 
the same micro-instructions may be used in different mix to imolement 
different macro-instructions, the number of micro-instructions is, in 
general, smaller than the number of macro-instructions. 

Given a new flight program instruction word to be executed, the 
first access to the ROM is dictated by the ooeration code field of 
the instruction. This operation code is decoded as a selection of one 
microword in the ROM. Subsequent accesses to the ROM until the "sub- 
routine" started by the operation code "call statement" is comoleted 
are dictated by the NA field of the microword itself. At the end 
of the sequence, the microword indicates that the sequence is comnlete 
and a new flight program instruction word is fetched by the FETCH 
CONTROL. Linder certain conditions (such as reneats and branches), 
the repeat counter and condition code register dictate that the NA 
field be ignored and that the step counter (ROM address register) be 
incremented or decremented to indicate the next ROM address to be 
accessed. There are, then, several different sources of the next 
ROM address to be accessed: 
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1. The operations code field of the nroqram instruction word 
found in the instruction register (U,, U-, U, in Figure 6) 
dictates the initial access to the RuM in executina a given 
program instruction word. 

2. The NA field of the microword just accessed indicates the next 
ROM address to be accessed except that: 

3. The repeat counter and condition code reaister can dictate 
direct modification of the step counter to yield the next ROM 
address to be accessed, in which case the NA field of the last 
microword accessed is ignored. 

The SEQUENCE CONTROL selects the proper source of the next ROM 
address to be accessed. It modifies the step counter as required by 
the repeat counter or condition code register, and selects the proper 
field (Up U 2 » or U^) from the instruction register dependent on 
whether half or full -word instructions are beinq executed. When the 
NA field is selected as the source of the next address, its contents 
could be held in a separate register until they could be compared with 
the CA field of the microword actually accessed to see if a nrooer 
accessing had occurred. However, because of the possible other sources 
of the next address, it appeared that the proner functioning of the 
SEQUENCE CONTROL, FETCH CONTROL, REPEAT COUNTER, and CONDITION CODE 
REGISTER could only be assured by duplication, parallel operation, and 
comparison for identical results. Only in this way did adequate con- 
tinuous checking of the proper sequencing to accesses seem feasible. 

While the duplication and comparison test method should be reserved 
for last consideration, as indicated in Section IV-C-8, its application 
to the small logic sections described here appeared to be required to 
provide continuous checking. Controls which are duplicated and compared 
can be placed in any module as long as the duplex circuitrv and com- 
parator are in the same module. Partitioning of this duplicated 
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circuitry was therefore dependent on 2A NAFI module limitations only. 

The portion of Figure 7 labeled SEQUENCE CONTROL MODULE, then, could be 
broken into several modules with isolation of faults to the modular 
level still provided. 

D. THE CORE MEMORY UNIT 

Modification of an existing design to meet the requirements for 
a 24-bit word length, 8K core memory for the parent computer was con- 
sidered. The use of an already developed memory design appeared 
favorable in light of the short schedule and low risk nature of the 
program. Although the final choice of memory type and size was depend- 
ent on changing requirements and therefore not firm, the example 
design will consider modifications of the basic design shown in Figure 9 
to provide a BIT capability with fault detection and isolation to the 
modular level as the goal. The memory to be modified, termed the 
"standard memory unit" (SMU), was a 3D, coincident current, 32-bit 
word length, random access, 4K core memory. The example used serves well 
to demonstrate the factors involved in memory test. 

Reference 35 briefly summarizes the standard technigues for func- 
tionally exercisina a core memory. The functional exercisers listed 
below check for proper operation of the memory as a black-box without 
examining specific internal circuits. The standard functional exer- 
cisers are: 

1. Check-sum - checks proper memory loading. This check can be 
accomplished using the flight program and constants stored in 
the computer for the mission. 
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2. One's discrimination - checks memory's ability to read and 
write ones coreectly. Memory buffer registers, sense amplifiers, 
the core array, and drivina circuits are checked by this test. 

3. Zero's discrimination - checks the memory's ability to read 
and write zeros correctly. The driving circuits are checked 
by this test, as well as the sense amplifiers' sensitivity to 
noi se . 

4. Addressing - checks whether or not each memory location can be 
correctly accessed. In addition to those circuits tested by 
the discrimination tests, the memory selection logic, decoders 
and drivers are checked. 

5. Checkerboard and Inverted Checkerboard - these tests produce 
worst case noise conditions upon half-ready, which results in 
maximum inhibit noise whenever a zero is written. The inhibit 
noise from a cycle where zero was written can cause an error 
during the read portion of the next cycle. 

The discrimination and checkerboard tests are aimed at discovering 
marginal conditions, and were not considered aoprooriate for airborne 
testing. They would certainly be appropriate as part of pre- or oost- 
flight checkout on the ground, as discussed earlier in relation to 
marginal testing in general. The check-sum and addressing tests, more 
suited to discovering existing solid failure, appeared to be appro- 
priate for in-flight application. 

The five tests enumerated above are program-oriented, periodically 
exercised tests. Test techniques which require added hardware include 
codinq and separate checking circuitry for each circuit type. Coding, 
principally parity, is popular for checking memories, but this tech- 
nique fell outside the program constraints for the example design. 
Techniques for adding specialized circuitry to test the memory are 
described in Ch. 14 of Ref. 43. The additional expense of the cir- 
cuitry and complete memory reconfiguration appeared inappropriate for 
the design modification intended. 
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Modification of the modular partitioning of the SMU appeared 
necessary to facilitate the test design if isolation to the modular 
level were to be accomplished. While packaged emoloying standard NAFI 
modules, the SMU did not use the 2A size, but rather the lA and IB 
sizes.^^ The standard memory unit was implemented with the equivalent 
of 152 lA NAFI modules. As evident in Figure 9, partitioning was done 
by circuitry type; e.g., there are 16 IB size sense/inhibit modules, 
one lA address register module, and so forth. Several modules, of 
different types, are involved in one memory access; an address reg- 
ister module, address decoder module, timing control and timing 
modules, and sense/inhibit modules are all involved in one access. 

It is difficult to determine airborne in which module the fault lies 
once one is detected by a functional test alone. A unique way of 
applying functional tests and some added hardware were required to 
accomplish the modular isolation capability required. 

Sixteen IB NAFI modules were used in the SMU to implement the 
sense/inhibit functions for the 32 memory planes (32-bit word length). 
This represents circuitry for two planes (bit locations) per IB 
module. An estimate (based on area limitation because of an essen- 
tially discrete component implementation of sense/inhibit circuitry, 
and allowing for added checking hardware) indicated that eight 2A 



The number in the NAFI size designator refers to the horizontal 
dimension of area (width), while the letter refers to the thickness. 
lA is the smallest basic size, having unit standard width and unit 
standard thickness. The 2A module is twice as wide as the lA and 
hence has twice the area, and the IB is twice as thick as the lA 
[Ref. 10]. 
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modules should amoly suffice to imolement the sense/inhibit functions 
for a 24-bit word length. Three planes were to be served oer 2A module. 
It was envisioned that bit locations served by a module would be ad- 
jacent. Figure 10 illustrates the scheme. 

The implementation of the decode function for the SMU required two 
modules dedicated to X select and two to Y select, each module serving 
the entire core stack. An approach to partitioning which initially 
aopeared attractive was to partition the decode logic so that the X 
and Y decode serving a smaller block of the core stack would be olaced 
in the same module. However, oartitioning the decode in effect doubles 
the logic required for every partitioning (e.g., placing the X and Y 
decode for one quarter of the core stack in one module would, for the 
entire core stack, entail quadruol icating the loaic). Duplication and 
comparison required only twice as much decode logic, and this method 
was chosen. For example, the circuitry on one of the two X decode IB 
modules is duplicated, the duplex hardware being placed in the same 
2A module. Figure 11 shows a decode module. Four 2A modules were 
required for decode in the example design. 

The address register also required duplication for separate test 
by the duplication and comparison technique. Checking of power supplies, 
transient protection, temperature tracking voltage sensors, timinq, and 
associated regulators have been excluded from consideration, as thev 
are hard-core housekeeping and service functions. The major areas 
subject to failure during flight are the decoding, sense/inhibit and 
select lines, cores, drivers, and amplifiers associated with accessing 
the memory, which are checked by the procedures described herein. 



65 



Fault isolation to the functional module level plus the core stack 
is provided by the test procedures described below. Faults occurrinq 
in the sense/inhibit functions are isolated to a single sense/inhibit 
module and core stack combination. Faults occurring in the addressing 
function are isolated to a single address register module or decode 
module. Faults occurrinq in the core stack are isolated to the core 
stack only if all tests can be conducted. No airborne discrimination 
between a single sense/inhibit module and the core stack aooeared feasi- 
ble if the sense/inhibit test failed because later tests could not 
then be confidently conducted. Such discrimination is easily 
accomolished on the ground. While a higher degree of isolation would 
be preferable, the level provided airborne closely focuses the efforts 
of maintenance oersonnel and greatly reduces the time/cost of mainte- 
nance. Sub core-stack isolation would probably not be useful since the 
core stack must be treated as an entity by maintenance personnel. 

Testing of the sense/inhibit functions should orecede testing of 
the decode function to insure that the latter tests are valid when 
conducted. The sense/inhibit functions serve the entire core stack; 
that is, a single sense amplifier & a single inhibit driver serve the 
same bit location in all the 8K words of the core stack. Each access 
to the core memory exercises all the sense/inhibit circuitry since 
all the bit locations of the word are involved. Solid failures result 
in a stuck-at-one or stuck-at-zero condition in a bit location. To 
isolate such fault manifestations to the sense/inhibit module or the 
core stack serving the bit location, one must first detect the fault 
and then relate it to the proper module. The test consists of attempt- 
ing to access a core memory location which contains a nreviously 
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stored constant, A location containing all one's tests for the stuck- 
at-zero condition. Another location containinq all zero's tests for 
the stuck-at-one condition. Two core memory locations are therefore 
dedicated for test use, one containing all one's and the other all 
zero's. A second set of such tests using the same cells should be 
nerformed to verify the restore oneration; however, discrimination 
between failures in the sense/inhibit module and the core stack would 
still not be provided because of the possibility of a broken sense line 
(which also looks like a sense amplifier stuck-at-zero) . Relating the 
failure to a specific sense/inhibit module is accomplished by checking 
hardware added to each module. Assuming eight sense/inhibit modules 
with three-bit locations served per module (24-bit word), one adds a 
three-bit register to each module (that is, in effect, a partitioned 
output buffer register for the core memory). A three-bit comparator 
(XOR) senses the failed condition when the three-bit locations are not 
identical. For example, stuck-at-one failure in the fourth bit loca- 
tion would be detected by accessing the memory location containinq 
all zero's. The three-bit register of the second sense/inhibit module 
(serving the second three-bit group of the 24-bit word) would read 
100, producing an error signal from the XOR circuit on the module. 
Figure 12 shows the configuration of the sense/inhibit module. 

The exercising procedure for the decode function and the core 
stack consists of check-summing over sections of memory. The core 
memory contains the stored program and constants (unalterable part of 
memory) which cannot change during flights, and a small section 
(scratch pad) reserved for storage of data which can change in-flight. 



67 



Scratch oad test will be discussed separately. Check-summina is 
accomol ished by cumulatively adding the contents of all the cells 
of the unalterable part of memory modulo 24, the final sum accruinq in 
the accumulator. The exnected check-sum (ECS) for the unalterable 
Dart of memory has been oreviously calculated externally and stored in 
the memory as a constant. Coincidence of the calculated sum and the 
ECS (subtraction is often used to give an expected zero result) indi- 
cates not only that the program stored in that part of memory is intact, 
but also that the accessing process has been properly accomnl ished. 

Sequential access to each cell of the segment is attempted during 
calculation of the sum; the sum will check with the ECS only if every 
access has been properly executed. The accessing process thorouohly 
exercises the core stack and its associated decode modules. Isolation 
of faults to the decode module (by its internal comparator) or to the 
core stack (by an incorrect check-sum) is thereby provided without 
separate addressing tests, modification of cell contents, or storaae 
of any test results. The ECS can be stored at the end of the unalter- 
able oart of memory. The core memory can also contain the memory test 
nroqram for check-summing, at the price of a few cells of core stor- 
age. The memory test program can also be microprogrammed in the ROM 
with other test sequences, and this alternative is preferable if 
sufficient ROM space is available. It has been imnlicit throuqhout 



Different schemes of handlinq the carry out of the most siqnifi- 
cant place (e.g., addition to the least significant bit location) 
reduce the probability of obtaining a proper check sum when failure 
exists to a negligibly low value. 
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the foregoing test orocedure description that the control and orocessor 
units have been tested nrior to memory checking so that they can be 
validly used to calculate the check-sums and do comoarisons. 

The scratch pad is tested last, and it must be treated somewhat 
differently, since its contents can chanae during the mission. Conse- 
quently, an ECS could not be calculated and stored earlier for comoari- 
son. In addition, there will be some data stored in scratch oad which 
cannot be destroyed during test mode; e.q., oositional data. The same 
check-sum test technique can, however, still be annlied if a small 
block of scratch pad cells (block A cells in Figure 13) can be altered 
during test. A like-sized block of stored program cells in the unalter- 
able part of memory (block B cells in Figure 13) is identified and its 
ECS externally calculated and stored as a constant nrior to flight. 
Figure 13 illustrates the checking procedure for a IK scratch oad. 256 
words of the scratch pad can be altered (block A cells). The sequence 
of steos to test the IK scratch oad is listed below: 

1. Write contents of block B cells into block A. 

2. Check-sum block A and comoare to nreviously stored ECS. 

3. Write unalterable scratch pad data of block C cells into block 
A for temoorary storage (block A cells and associated decode 
modules have been verified by steos 1 and 2). 

4. Write contents of block B cells into block C. 

5. Check-sum block C and comoare to nreviously stored ECS. 

6. Restore data temporarily stored in block A into block C. 

7. Continue the procedure with blocks D and E to comolete scratch 
pad test. 

Note should be made that the size of block A can be quite small, 
if necessary, with resulting increase in the number of data shuffles 
required to comolete scratch nad test. Alternate techniques to test 
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the scratch oad include coding, addition of more hardware, or oerhaos 
acceptance of an untested scratch pad in consonance with reasonable 
test objectives discussed earlier. 

E. TESTING THE CHECKING HARDWARE 

The checking hardware represents hard-core circuitrv whose proper 
functioning must be assured before test results are considered valid. 

The failure of checking circuitry can lead to the very undesirable 
indication of error when none exists, or failure to flag existing 
error. To provide assurance that checking hardware is fault-free, one 
can 

1. Provide redundant circuitry with reliability an order of 
magnitude higher than the circuitry it checks. 

2. Provide some earlier periodic check to verify orooer 
operation before test commences. 

3. Verify only during periodic maintenance periods. 

The first alternative tends to be too expensive, at least doubling 
the hardware cost of providing built-in test. The third alternative 
reduces confidence in the test results to an unacceptably low level. 

A periodic gross functional check of the checking circuitry is probably 
most feasible, but at the expense of a few words of core storage. 

Test bit patterns stored in core-memory can be used to initialize the 
circuitry so that the left and right half-words will differ. Error 
therefore should be indicated. Identical half-word patterns can be 
introduced, in which no error should be signalled. Such tests can be 
made part of the periodic test sequence preceding test of the rest of 
the computer. While it is reconnized that comprehensive test has not been 
achieved, one can be assured of a high degree of confidence in the 
checking circuitry for minimal cost and effort. 
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F. SEQUENCE OF TESTING 

The sequence in which testing should be conducted for the oarent 
computer has been indicated in the separate sections. A summary is 
useful to gain better perspective. For those portions periodically 
tested, the priority should be: 

1. Prefl ight marginal checks. 

2. The checking circuitry (gross functional check). 

3. The processor unit. 

4. The core memory. 

a. Sense/inhibit function 

b. The core stack (check-sum) 

c. Scratch pad 

Those portions tested continuously include: 

1. Hard-core housekeeping and service functions (power supplies, 
clock, and so forth) 

2. The control unit 

3. Core memory (partially) 

a. Address register 

b. Decode function 

G. PROCESSING OF ERROR SIGNALS 

Some general comments should be made relative to the handling of 
error signals once issued. If the goal of providing a separate error 
signal from each module of the comouter is achieved, a larae number 
of sources will be reporting. The renorts must be interpreted and 
processed to achieve the desired test goals. 

First, the signal lines should be made "fail-safe"; that is, a 
voltage should be present on each line except when it is reporting 
failure. In this way, the line itself is checked since the absence 
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of a voltage will lead to investigation of the cause. The problem of 
errors oropagating from module to module, giving several false error 
signals in addition to the accurate error signal, has been resolved 
locally in the modules by error-inhibit orecautions, as in the general 
processor module checking circuitry. An error signal transmission path 
should be provided separate from other computer output paths, and 
by the most direct route to allow signals to be communicated under a 
failed condition. The problem of signal interpretation remains to be 
resolved. 

A reasonable number of 128 modules with separate error lines will 
be assumed. By the single failure assumntion, only one of the 128 lines 
will signal error at one time. With 80 pins limiting the 2A NAFI 
module, two separate error processing modules would be necessary to 
accommodate the required error inputs. Sixty-four error lines would 
then input to each module, well within the 80 nin limitation. Encoding 
circuitry in each module would encode the error source into binary 
code, each error line having a unique binary number identifying it. 

Seven output lines, then, would be necessary from each module, six to 
encode one of 64, and one to indicate which module was sending the en- 
coded error message giving a resolving power of one in 128. A total 
of 71 input and output lines for each module, plus required power 
supply and timing inputs, appears reasonable relative to oin limita- 
tions. The encoded message would then be routed by direct means to a 
central buffer register where the message of error location would be 
preserved by some recording means for later use by maintenance per- 
sonnel. The message could also be used to turn off the central power 
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source to avoid the use of contaminated comoutations . The nilot would 



be notified of error in accordance with test goals. Care would have 
to be taken to ensure that failures in checking hardware, detected 
durinq nre-test oeriodic check, did not initiate comnuter shutdown. 
In such cases, notification to the pilot that the error checking 
capability of the computer had failed would allow him to continue 
its use knowledgeable of the attendant risk. 
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VI. DESIGN EVALUATION 



Any test design is subject to the unique limitations imposed 
by the parent design program, and the example used was no exception. 

The design oresented achieves the reasonable objectives established 
for it in almost all instances: 

1. A thorough self-test capability is provided for the parent 
computer in the airborne environment with a high confidence 
level for the test results. The risk of undetected error is 
kept negligibly low. 

2. The test design represents a unique series of tradeoffs, 
optimizing the test performance per dollar for the short 
schedule, low risk program. Maximum advantage was taken of 
proposed architectural characteristics for the machine. 

The hardware -software split duplication technique and the 
proposed modification of an existing memory design illustrate 
thi s . 

3. Partitioning of the computer was achieved using the specified 
NAFI 2A module. Detection and isolation of the most important 
classes of faults to this modular level is automatically 
provided. This capability was achieved while allowing for 
flexible word length with minimal basic design changes. In 
the highly regular processor and memory units, the number of 
different module types was kept favorably low. 

4. Redundancy was not general Iv used. The capability of signif- 
icant test performance is provided for considerably less 
than duplication of hardware. 

5. The test design required very few cells of core storage, 

such requirements being limited to a few constants and possibly 
a memory test routine of short length. A simple pseudo- 
random number generator to provide test but patterns was 
substituted for a large number of stored constants. The 
coding techniques used required no core storage, leaving 
maximum word length available for operational use. Dissoci- 
ating the core memory from the processor and control units 
simplified the overall test problem. 

6. Operational degradation was minimized throughout. An inter- 
ruptable microprogrammed routine using idle time and executed 
at read only memory cycle speed provides valid test without 
infringing on operational availability. 
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Assignment of a specific figure of merit to the test design 
must await choice of soecific hardware, and the imoortant micro- 
programming of the test routine upon which much of the ootential 
test oerformance is predicated. 

Various figures of merit can be assigned to a test design. Davis 
[Ref. 9] developed a formula to assign a figure of merit to his 
residue code arithmetic unit test scheme. Other fiqures relating to 
cost, such as the 10% added hardware figure mentioned earlier, or in 
more absolute terms the cost of BIT oer gate tested have been assigned. 
The ultimate justification for a self-test caoability is its measured 
performance in detecting errors. A high confidence level that a 
high percentage of potential failure sources have been checked seems 
to the author to be the best figure of merit. 

Evaluation of a self-test canabilitv can be accomnlished in 
several ways. One technigue which allows such evaluation is simula- 
tion, during which faults can be artificially duplicated to verify 
expected test response. Once the computer is built, actual faults can 
be injected and the response measured. Failure history for a produc- 
tion machine can also help in evaluating test efficiency. A full- 
scale simulation of the parent computer with self-test circuitry was 
envisioned. 

The example design promises to provide significantly more test 
capability per dollar than previous designs for similar computers. Its 
potential beneficial effect on overall cost of ownership makes the 
self-test capability provided by the desiqn a very attractive feature. 
Recoqnition of this fact should certainly result in oreater future 
emphasis on the relatively new field of built-in self-test desiqn. 
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VII. SUGGESTED FURTHER INVESTIGATION 



The subject of derivation of an optimal test routine using the 
micro-instruction set is an interesting one for future work. Many 
techniques, some briefly presented herein, suggest ways in which the 
states of a block of logic can be identified and related to the micro- 
instructions. Additionally, special instructions for test use only 
can be formulated, as needed. Computer-aided design fits well in this 
category. 

Once error signals from each module can be provided, the subject 
of automatic reconfiguration for continued operation after failure can 
be addressed. Ideally, the error signal from a "bad" module would be 
used to turn off the bad module and switch in a substituting module. 
For example, in the processor, the three identical modules of a set 
could be joined by a fourth identical module to be used in the event 
of failure. The ability to add such a reconfiguration capability in 
modular form might prove to be an attractive option available at extra 
cost dependent on the computer's intended use. 

The ability of a computer to continue to operate after failure in 
a degraded mode using its remaining unfailed circuitry might be inves- 
tigated. For example, limited operations might continue at a slower 
speed for high priority tasks related to aircraft survival (e.g., 
electronic countermeasures and navigation.) 

Lastly, the effects of continued technological advance on test 
design and self-repair offer fruitful subjects for further investi- 
aation . 
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APPENDIX A - PSEUDO-RANDOM NUMBER GENERATOR 



The pseudo-random number generator shown below generates the 

12 

maximum length sequence of 2 different 12-b1t binary patterns. The 
numbers so produced are random in each bit position. The Generator 
implements the modulo 2 irreducible polynomial 

+ X + 1 

as a linear feedback shift register. A different pattern is oroduced 
at each clock pulse. The nonlinearity of the all -zero case is added 
by the 11 - innut NAND gate (which, of course, can be implemented as 
several gates instead of one). 
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The 2A NAFI Module 
Figure 1 
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Figure 4 
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Modular Relationships 
Figure 5 
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Control Unit Organization 
Figure 6 
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Figure 9 
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Decode Module 
Figure 11 
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